DROPS

Document

DOI: 10.4230/LIPIcs.MFCS.2018.66

Fast Entropy-Bounded String Dictionary Look-Up with Mismatches

Authors: Pawel Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya

Published in: LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)

Abstract

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of d strings of length m and an integer k, we must preprocess it into a data structure to answer the following queries: Given a query string Q of length m, find all strings in the dictionary that are at Hamming distance at most k from Q. Chan and Lewenstein (CPM 2015) showed a data structure for k = 1 with optimal query time O(m/w + occ), where w is the size of a machine word and occ is the size of the output. The data structure occupies O(w d log^{1+epsilon} d) extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values k. Namely, we give a data structure that has O(m/w + log^k d + occ) query time and uses O(w d log^k d) extra bits of space.

Cite as

Pawel Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya. Fast Entropy-Bounded String Dictionary Look-Up with Mismatches. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 66:1-66:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.MFCS.2018.66,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Starikovskaya, Tatiana},
  title =	{{Fast Entropy-Bounded String Dictionary Look-Up with Mismatches}},
  booktitle =	{43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)},
  pages =	{66:1--66:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-086-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{117},
  editor =	{Potapov, Igor and Spirakis, Paul and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.66},
  URN =		{urn:nbn:de:0030-drops-96486},
  doi =		{10.4230/LIPIcs.MFCS.2018.66},
  annote =	{Keywords: Dictionary look-up, Hamming distance, compact data structures}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.16

Near-Optimal Distance Emulator for Planar Graphs

Authors: Hsien-Chih Chang, Pawel Gawrychowski, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

Given a graph G and a set of terminals T, a distance emulator of G is another graph H (not necessarily a subgraph of G) containing T, such that all the pairwise distances in G between vertices of T are preserved in H. An important open question is to find the smallest possible distance emulator. We prove that, given any subset of k terminals in an n-vertex undirected unweighted planar graph, we can construct in O~(n) time a distance emulator of size O~(min(k^2,sqrt{k * n})). This is optimal up to logarithmic factors. The existence of such distance emulator provides a straightforward framework to solve distance-related problems on planar graphs: Replace the input graph with the distance emulator, and apply whatever algorithm available to the resulting emulator. In particular, our result implies that, on any unweighted undirected planar graph, one can compute all-pairs shortest path distances among k terminals in O~(n) time when k=O(n^{1/3}).

Cite as

Hsien-Chih Chang, Pawel Gawrychowski, Shay Mozes, and Oren Weimann. Near-Optimal Distance Emulator for Planar Graphs. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{chang_et_al:LIPIcs.ESA.2018.16,
  author =	{Chang, Hsien-Chih and Gawrychowski, Pawel and Mozes, Shay and Weimann, Oren},
  title =	{{Near-Optimal Distance Emulator for Planar Graphs}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.16},
  URN =		{urn:nbn:de:0030-drops-94796},
  doi =		{10.4230/LIPIcs.ESA.2018.16},
  annote =	{Keywords: planar graphs, shortest paths, metric compression, distance preservers, distance emulators, distance oracles}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.33

Edit Distance with Block Operations

Authors: Michal Ganczorz, Pawel Gawrychowski, Artur Jez, and Tomasz Kociumaka

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

We consider the problem of edit distance in which block operations are allowed, i.e. we ask for the minimal number of (block) operations that are needed to transform a string s to t. We give O(log n) approximation algorithms, where n is the total length of the input strings, for the variants of the problem which allow the following sets of operations: block move; block move and block delete; block move and block copy; block move, block copy, and block uncopy. The results still hold if we additionally allow any of the following operations: character insert, character delete, block reversal, or block involution (involution is a generalisation of the reversal). Previously, algorithms only for the first and last variant were known, and they had approximation ratios O(log n log^*n) and O(log n (log^*n)^2), respectively. The edit distance with block moves is equivalent, up to a constant factor, to the common string partition problem, in which we are given two strings s, t and the goal is to partition s into minimal number of parts such that they can be permuted in order to obtain t. Thus we also obtain an O(log n) approximation for this problem (compared to the previous O(log n log^* n)). The results use a simplification of the previously used technique of locally consistent parsing, which groups short substrings of a string into phrases so that similar substrings are guaranteed to be grouped in a similar way. Instead of a sophisticated parsing technique relying on a deterministic coin tossing, we use a simple one based on a partition of the alphabet into two subalphabets. In particular, this lowers the running time from O(n log^* n) to O(n). The new algorithms (for block copy or block delete) use a similar algorithm, but the analysis is based on a specially tuned combinatorial function on sets of numbers.

Cite as

Michal Ganczorz, Pawel Gawrychowski, Artur Jez, and Tomasz Kociumaka. Edit Distance with Block Operations. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{ganczorz_et_al:LIPIcs.ESA.2018.33,
  author =	{Ganczorz, Michal and Gawrychowski, Pawel and Jez, Artur and Kociumaka, Tomasz},
  title =	{{Edit Distance with Block Operations}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{33:1--33:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.33},
  URN =		{urn:nbn:de:0030-drops-94963},
  doi =		{10.4230/LIPIcs.ESA.2018.33},
  annote =	{Keywords: Edit distance, Block operations, Common string partition}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.45

Edit Distance between Unrooted Trees in Cubic Time

Authors: Bartlomiej Dudek and Pawel Gawrychowski

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

Edit distance between trees is a natural generalization of the classical edit distance between strings, in which the allowed elementary operations are contraction, uncontraction and relabeling of an edge. Demaine et al. [ACM Trans. on Algorithms, 6(1), 2009] showed how to compute the edit distance between rooted trees on n nodes in O(n^3) time. However, generalizing their method to unrooted trees seems quite problematic, and the most efficient known solution remains to be the previous O(n^3 log n) time algorithm by Klein [ESA 1998]. Given the lack of progress on improving this complexity, it might appear that unrooted trees are simply more difficult than rooted trees. We show that this is, in fact, not the case, and edit distance between unrooted trees on n nodes can be computed in O(n^3) time. A significantly faster solution is unlikely to exist, as Bringmann et al. [SODA 2018] proved that the complexity of computing the edit distance between rooted trees cannot be decreased to O(n^{3-epsilon}) unless some popular conjecture fails, and the lower bound easily extends to unrooted trees. We also show that for two unrooted trees of size m and n, where m <=n, our algorithm can be modified to run in O(nm^2(1+log(n/m))). This, again, matches the complexity achieved by Demaine et al. for rooted trees, who also showed that this is optimal if we restrict ourselves to the so-called decomposition algorithms.

Cite as

Bartlomiej Dudek and Pawel Gawrychowski. Edit Distance between Unrooted Trees in Cubic Time. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 45:1-45:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.ICALP.2018.45,
  author =	{Dudek, Bartlomiej and Gawrychowski, Pawel},
  title =	{{Edit Distance between Unrooted Trees in Cubic Time}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{45:1--45:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.45},
  URN =		{urn:nbn:de:0030-drops-90492},
  doi =		{10.4230/LIPIcs.ICALP.2018.45},
  annote =	{Keywords: tree edit distance, dynamic programming, heavy light decomposition}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.61

Improved Bounds for Shortest Paths in Dense Distance Graphs

Authors: Pawel Gawrychowski and Adam Karczmarz

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

We study the problem of computing shortest paths in so-called dense distance graphs, a basic building block for designing efficient planar graph algorithms. Let G be a plane graph with a distinguished set partial{G} of boundary vertices lying on a constant number of faces of G. A distance clique of G is a complete graph on partial{G} encoding all-pairs distances between these vertices. A dense distance graph is a union of possibly many unrelated distance cliques. Fakcharoenphol and Rao [Fakcharoenphol and Rao, 2006] proposed an efficient implementation of Dijkstra's algorithm (later called FR-Dijkstra) computing single-source shortest paths in a dense distance graph. Their algorithm spends O(b log^2{n}) time per distance clique with b vertices, even though a clique has b^2 edges. Here, n is the total number of vertices of the dense distance graph. The invention of FR-Dijkstra was instrumental in obtaining such results for planar graphs as nearly-linear time algorithms for multiple-source-multiple-sink maximum flow and dynamic distance oracles with sublinear update and query bounds. At the heart of FR-Dijkstra lies a data structure updating distance labels and extracting minimum labeled vertices in O(log^2{n}) amortized time per vertex. We show an improved data structure with O((log^2{n})/(log^2 log n)) amortized bounds. This is the first improvement over the data structure of Fakcharoenphol and Rao in more than 15 years. It yields improved bounds for all problems on planar graphs, for which computing shortest paths in dense distance graphs is currently a bottleneck.

Cite as

Pawel Gawrychowski and Adam Karczmarz. Improved Bounds for Shortest Paths in Dense Distance Graphs. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 61:1-61:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2018.61,
  author =	{Gawrychowski, Pawel and Karczmarz, Adam},
  title =	{{Improved Bounds for Shortest Paths in Dense Distance Graphs}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{61:1--61:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.61},
  URN =		{urn:nbn:de:0030-drops-90654},
  doi =		{10.4230/LIPIcs.ICALP.2018.61},
  annote =	{Keywords: shortest paths, dense distance graph, planar graph, Monge matrix}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.62

Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance

Authors: Pawel Gawrychowski and Przemyslaw Uznanski

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/- epsilon)-approximation seems more natural. For k-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L_1 distance. Finally, for (1 +/- epsilon)-approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{-2} n). We improve this to O~(epsilon^{-1}n), thus essentially matching the complexity of the best known algorithm for (1 +/- epsilon)-approximate Hamming distance.

Cite as

Pawel Gawrychowski and Przemyslaw Uznanski. Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 62:1-62:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2018.62,
  author =	{Gawrychowski, Pawel and Uznanski, Przemyslaw},
  title =	{{Towards Unified Approximate Pattern Matching for Hamming and L\underline1 Distance}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{62:1--62:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.62},
  URN =		{urn:nbn:de:0030-drops-90669},
  doi =		{10.4230/LIPIcs.ICALP.2018.62},
  annote =	{Keywords: approximate pattern matching, conditional lower bounds, L\underline1 distance, Hamming distance}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.63

A Faster Construction of Greedy Consensus Trees

Authors: Pawel Gawrychowski, Gad M. Landau, Wing-Kin Sung, and Oren Weimann

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

A consensus tree is a phylogenetic tree that captures the similarity between a set of conflicting phylogenetic trees. The problem of computing a consensus tree is a major step in phylogenetic tree reconstruction. It is also central for predicting a species tree from a set of gene trees, as indicated recently in [Nature 2013]. This paper focuses on two of the most well-known and widely used consensus tree methods: the greedy consensus tree and the frequency difference consensus tree. Given k conflicting trees each with n leaves, the previous fastest algorithms for these problems were O(k n^2) for the greedy consensus tree [J. ACM 2016] and O~(min{k n^2, k^2n}) for the frequency difference consensus tree [ACM TCBB 2016]. We improve these running times to O~(k n^{1.5}) and O~(k n) respectively.

Cite as

Pawel Gawrychowski, Gad M. Landau, Wing-Kin Sung, and Oren Weimann. A Faster Construction of Greedy Consensus Trees. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 63:1-63:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2018.63,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Sung, Wing-Kin and Weimann, Oren},
  title =	{{A Faster Construction of Greedy Consensus Trees}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{63:1--63:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.63},
  URN =		{urn:nbn:de:0030-drops-90676},
  doi =		{10.4230/LIPIcs.ICALP.2018.63},
  annote =	{Keywords: phylogenetic trees, greedy consensus trees, dynamic trees}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.64

A Faster FPTAS for #Knapsack

Authors: Pawel Gawrychowski, Liran Markin, and Oren Weimann

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

Given a set W = {w_1,..., w_n} of non-negative integer weights and an integer C, the #Knapsack problem asks to count the number of distinct subsets of W whose total weight is at most C. In the more general integer version of the problem, the subsets are multisets. That is, we are also given a set {u_1,..., u_n} and we are allowed to take up to u_i items of weight w_i. We present a deterministic FPTAS for #Knapsack running in O(n^{2.5}epsilon^{-1.5}log(n epsilon^{-1})log (n epsilon)) time. The previous best deterministic algorithm [FOCS 2011] runs in O(n^3 epsilon^{-1} log(n epsilon^{-1})) time (see also [ESA 2014] for a logarithmic factor improvement). The previous best randomized algorithm [STOC 2003] runs in O(n^{2.5} sqrt{log (n epsilon^{-1})} + epsilon^{-2} n^2) time. Therefore, for the case of constant epsilon, we close the gap between the O~(n^{2.5}) randomized algorithm and the O~(n^3) deterministic algorithm. For the integer version with U = max_i {u_i}, we present a deterministic FPTAS running in O(n^{2.5}epsilon^{-1.5}log(n epsilon^{-1} log U)log (n epsilon) log^2 U) time. The previous best deterministic algorithm [TCS 2016] runs in O(n^3 epsilon^{-1}log(n epsilon^{-1} log U) log^2 U) time.

Cite as

Pawel Gawrychowski, Liran Markin, and Oren Weimann. A Faster FPTAS for #Knapsack. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 64:1-64:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2018.64,
  author =	{Gawrychowski, Pawel and Markin, Liran and Weimann, Oren},
  title =	{{A Faster FPTAS for #Knapsack}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{64:1--64:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.64},
  URN =		{urn:nbn:de:0030-drops-90687},
  doi =		{10.4230/LIPIcs.ICALP.2018.64},
  annote =	{Keywords: knapsack, approximate counting, K-approximating sets and functions}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.16

Slowing Down Top Trees for Better Worst-Case Compression

Authors: Bartlomiej Dudek and Pawel Gawrychowski

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

We consider the top tree compression scheme introduced by Bille et al. [ICALP 2013] and construct an infinite family of trees on n nodes labeled from an alphabet of size sigma, for which the size of the top DAG is Theta(n/log_sigma n log log_sigma n). Our construction matches a previously known upper bound and exhibits a weakness of this scheme, as the information-theoretic lower bound is Omega(n/log_sigma n}). This settles an open problem stated by Lohrey et al. [arXiv 2017], who designed a more involved version achieving the lower bound. We show that this can be also guaranteed by a very minor modification of the original scheme: informally, one only needs to ensure that different parts of the tree are not compressed too quickly. Arguably, our version is more uniform, and in particular, the compression procedure is oblivious to the value of sigma.

Cite as

Bartlomiej Dudek and Pawel Gawrychowski. Slowing Down Top Trees for Better Worst-Case Compression. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 16:1-16:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.CPM.2018.16,
  author =	{Dudek, Bartlomiej and Gawrychowski, Pawel},
  title =	{{Slowing Down Top Trees for Better Worst-Case Compression}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{16:1--16:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.16},
  URN =		{urn:nbn:de:0030-drops-86920},
  doi =		{10.4230/LIPIcs.CPM.2018.16},
  annote =	{Keywords: top trees, compression, tree grammars}
}

Document

DOI: 10.4230/LIPIcs.ESA.2017.40

Dispersion on Trees

Authors: Pawel Gawrychowski, Nadav Krasnopolsky, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Abstract

In the k-dispersion problem, we need to select k nodes of a given graph so as to maximize the minimum distance between any two chosen nodes. This can be seen as a generalization of the independent set problem, where the goal is to select nodes so that the minimum distance is larger than 1. We design an optimal O(n) time algorithm for the dispersion problem on trees consisting of n nodes, thus improving the previous O(n log n) time solution from 1997. We also consider the weighted case, where the goal is to choose a set of nodes of total weight at least W. We present an O(n log^2n) algorithm improving the previous O(n log^4 n) solution. Our solution builds on the search version (where we know the minimum distance lambda between the chosen nodes) for which we present tight Theta(n log n) upper and lower bounds.

Cite as

Pawel Gawrychowski, Nadav Krasnopolsky, Shay Mozes, and Oren Weimann. Dispersion on Trees. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 40:1-40:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ESA.2017.40,
  author =	{Gawrychowski, Pawel and Krasnopolsky, Nadav and Mozes, Shay and Weimann, Oren},
  title =	{{Dispersion on Trees}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{40:1--40:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.40},
  URN =		{urn:nbn:de:0030-drops-78438},
  doi =		{10.4230/LIPIcs.ESA.2017.40},
  annote =	{Keywords: parametric search, dispersion, k-center, dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.10

A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem

Authors: Bartlomiej Dudek, Pawel Gawrychowski, and Piotr Ostropolski-Nalewaja

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

In the Maximum Duo-Preservation String Mapping problem we are given two strings and wish to map the letters of the former to the letters of the latter as to maximise the number of duos. A duo is a pair of consecutive letters that is mapped to a pair of consecutive letters in the same order. This is complementary to the well-studied Minimum Common String Partition problem, where the goal is to partition the former string into blocks that can be permuted and concatenated to obtain the latter string. Maximum Duo-Preservation String Mapping is APX-hard. After a series of improvements, Brubach [WABI 2016] showed a polynomial-time 3.25-approximation algorithm. Our main contribution is that, for any eps>0, there exists a polynomial-time (2+eps)-approximation algorithm. Similarly to a previous solution by Boria et al. [CPM 2016], our algorithm uses the local search technique. However, this is used only after a certain preliminary greedy procedure, which gives us more structure and makes a more general local search possible. We complement this with a specialised version of the algorithm that achieves 2.67-approximation in quadratic time.

Cite as

Bartlomiej Dudek, Pawel Gawrychowski, and Piotr Ostropolski-Nalewaja. A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.CPM.2017.10,
  author =	{Dudek, Bartlomiej and Gawrychowski, Pawel and Ostropolski-Nalewaja, Piotr},
  title =	{{A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{10:1--10:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.10},
  URN =		{urn:nbn:de:0030-drops-73458},
  doi =		{10.4230/LIPIcs.CPM.2017.10},
  annote =	{Keywords: approximation scheme, minimum common string partition, local search}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2016.35

LZ77 Factorisation of Trees

Authors: Pawel Gawrychowski and Artur Jez

Published in: LIPIcs, Volume 65, 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)

Abstract

We generalise the fundamental concept of LZ77 factorisation from strings to trees. A tree is represented as a collection of edge-disjoint fragments that either consist of one node or has already occurred earlier (in the BFS order). Similarly as for strings, such a collection uniquely determines the tree, so by minimising the number of fragments we obtain a compressed representation of the tree. We show that our generalisation has several useful properties of the standard LZ77 factorisation: it can be computed in polynomial time and its simpler variant in linear time; its size is not larger than the smallest grammar for a tree; it can be transformed (in linear time) into a tree grammar of size O(rg log(n/(rg))), where n is the size of the tree, g the size of the smallest grammar for this tree and r the maximal arity of the nodes in the tree, which matches a recent bound of Jez and Lohrey [STACS 2014], but with a simpler and more modular proof.

Cite as

Pawel Gawrychowski and Artur Jez. LZ77 Factorisation of Trees. In 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 65, pp. 35:1-35:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.FSTTCS.2016.35,
  author =	{Gawrychowski, Pawel and Jez, Artur},
  title =	{{LZ77 Factorisation of Trees}},
  booktitle =	{36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)},
  pages =	{35:1--35:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-027-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{65},
  editor =	{Lal, Akash and Akshay, S. and Saurabh, Saket and Sen, Sandeep},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2016.35},
  URN =		{urn:nbn:de:0030-drops-68700},
  doi =		{10.4230/LIPIcs.FSTTCS.2016.35},
  annote =	{Keywords: Tree grammars, Grammar compression, LZ77, SLP, Tree compression}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.5

Faster Longest Common Extension Queries in Strings over General Alphabets

Authors: Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(q log log n + n log* n) time making only O(q + n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabets can be computed in O(n log log n) time making O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who designed an algorithm with O(n log^⅔ n) running time and conjectured that O(n) time is possible. Our paper makes a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(q log n + n log* n). The main tools are difference covers and a variant of the disjoint-sets data structure by La Poutré (SODA 1990).

Cite as

Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster Longest Common Extension Queries in Strings over General Alphabets. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.5,
  author =	{Gawrychowski, Pawel and Kociumaka, Tomasz and Rytter, Wojciech and Walen, Tomasz},
  title =	{{Faster Longest Common Extension Queries in Strings over General Alphabets}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.5},
  URN =		{urn:nbn:de:0030-drops-60810},
  doi =		{10.4230/LIPIcs.CPM.2016.5},
  annote =	{Keywords: longest common extension, longest common prefix, maximal repetitions, difference cover}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.18

Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams

Authors: Pawel Gawrychowski, Oleg Merkurev, Arseny Shur, and Przemyslaw Uznanski

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized approximation algorithms processing inputs of length n. We rule out Las Vegas algorithms, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we prove a lower bounds of Omega(M log min {|Sigma|, M}) bits of memory; here M=n/E for approximating the answer with additive error E, and M= log n / log (1 + epsilon) for approximating the answer with multiplicative error (1 + epsilon). Second, we design three real-time algorithms for this problem. Our Monte Carlo approximation algorithms for both additive and multiplicative versions of the problem use O(M) words of memory. Thus the obtained lower bounds are asymptotically tight up to a logarithmic factor. The third algorithm is deterministic and finds a longest palindrome exactly if it is short. This algorithm can be run in parallel with a Monte Carlo algorithm to obtain better results in practice. Overall, both the time and space complexity of finding a longest palindrome in a stream are essentially settled.

Cite as

Pawel Gawrychowski, Oleg Merkurev, Arseny Shur, and Przemyslaw Uznanski. Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 18:1-18:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.18,
  author =	{Gawrychowski, Pawel and Merkurev, Oleg and Shur, Arseny and Uznanski, Przemyslaw},
  title =	{{Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{18:1--18:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.18},
  URN =		{urn:nbn:de:0030-drops-60765},
  doi =		{10.4230/LIPIcs.CPM.2016.18},
  annote =	{Keywords: streaming algorithms, space lower bounds, real-time algorithms, palin- dromes, Monte Carlo algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.25

The Nearest Colored Node in a Tree

Authors: Pawel Gawrychowski, Gad M. Landau, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

We start a systematic study of data structures for the nearest colored node problem on trees. Given a tree with colored nodes and weighted edges, we want to answer queries (v,c) asking for the nearest node to node v that has color c. This is a natural generalization of the well-known nearest marked ancestor problem. We give an O(n)-space O(log log n)-query solution and show that this is optimal. We also consider the dynamic case where updates can change a node's color and show that in O(n) space we can support both updates and queries in O(log n) time. We complement this by showing that O(polylog n) update time implies Omega(log n \ log log n) query time. Finally, we consider the case where updates can change the edges of the tree (link-cut operations). There is a known (top-tree based) solution that requires update time that is roughly linear in the number of colors. We show that this solution is probably optimal by showing that a strictly sublinear update time implies a strictly subcubic time algorithm for the classical all pairs shortest paths problem on a general graph. We also consider versions where the tree is rooted, and the query asks for the nearest ancestor/descendant of node v that has color c, and present efficient data structures for both variants in the static and the dynamic setting.

Cite as

Pawel Gawrychowski, Gad M. Landau, Shay Mozes, and Oren Weimann. The Nearest Colored Node in a Tree. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 25:1-25:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.25,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Mozes, Shay and Weimann, Oren},
  title =	{{The Nearest Colored Node in a Tree}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{25:1--25:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.25},
  URN =		{urn:nbn:de:0030-drops-60674},
  doi =		{10.4230/LIPIcs.CPM.2016.25},
  annote =	{Keywords: Marked ancestor, Vertex-label distance oracles, Nearest colored descend- ant, Top-trees}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2016.9

Randomized Algorithms for Finding a Majority Element

Authors: Pawel Gawrychowski, Jukka Suomela, and Przemyslaw Uznanski

Published in: LIPIcs, Volume 53, 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)

Abstract

Given n colored balls, we want to detect if more than n/2 of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only 2n comparisons while using only a logarithmic number of bits for bookkeeping. The resulting algorithm is called the Boyer-Moore majority vote algorithm. It is known that any deterministic method needs 3n/2-2 comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only 7n/6+o(n) comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least 1.019n.

Cite as

Pawel Gawrychowski, Jukka Suomela, and Przemyslaw Uznanski. Randomized Algorithms for Finding a Majority Element. In 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.SWAT.2016.9,
  author =	{Gawrychowski, Pawel and Suomela, Jukka and Uznanski, Przemyslaw},
  title =	{{Randomized Algorithms for Finding a Majority Element}},
  booktitle =	{15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)},
  pages =	{9:1--9:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-011-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{53},
  editor =	{Pagh, Rasmus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2016.9},
  URN =		{urn:nbn:de:0030-drops-60273},
  doi =		{10.4230/LIPIcs.SWAT.2016.9},
  annote =	{Keywords: majority, randomized algorithms, lower bounds}
}

Document

DOI: 10.4230/LIPIcs.STACS.2016.39

Efficiently Finding All Maximal alpha-gapped Repeats

Authors: Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

Abstract

For alpha >=1, an alpha-gapped repeat in a word w is a factor uvu of w such that |uv| <= alpha * |u|; the two occurrences of a factor u in such a repeat are called arms. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right nor to the left. We show that the number of all maximal alpha-gapped repeats occurring in words of length n is upper bounded by 18 * alpha * n, allowing us to construct an algorithm finding all maximal alpha-gapped repeats of a word on an integer alphabet of size n^{O}(1)} in {O}(alpha * n) time. This result is optimal as there are words that have Theta(alpha * n) maximal alpha-gapped repeats. Our techniques can be extended to get comparable results in the case of alpha-gapped palindromes, i.e., factors uvu^{T} with |uv| <= alpha |u|.

Cite as

Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently Finding All Maximal alpha-gapped Repeats. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 39:1-39:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2016.39,
  author =	{Gawrychowski, Pawel and I, Tomohiro and Inenaga, Shunsuke and K\"{o}ppl, Dominik and Manea, Florin},
  title =	{{Efficiently Finding All Maximal alpha-gapped Repeats}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{39:1--39:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.39},
  URN =		{urn:nbn:de:0030-drops-57408},
  doi =		{10.4230/LIPIcs.STACS.2016.39},
  annote =	{Keywords: combinatorics on words, counting algorithms}
}

Document

DOI: 10.4230/LIPIcs.STACS.2014.337

Testing Generalised Freeness of Words

Authors: Pawel Gawrychowski, Florin Manea, and Dirk Nowotka

Published in: LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

Abstract

Pseudo-repetitions are a natural generalisation of the classical notion of repetitions in sequences: they are the repeated concatenation of a word and its encoding under a certain morphism or antimorphism (anti-/morphism, for short). We approach the problem of deciding efficiently, for a word w and a literal anti-/morphism f, whether w contains an instance of a given pattern involving a variable x and its image under f, i.e., f(x). Our results generalise both the problem of finding fixed repetitive structures (e.g., squares, cubes) inside a word and the problem of finding palindromic structures inside a word. For instance, we can detect efficiently a factor of the form xx^Rxxx^R, or any other pattern of such type. We also address the problem of testing efficiently, in the same setting, whether the word w contains an arbitrary pseudo-repetition of a given exponent.

Cite as

Pawel Gawrychowski, Florin Manea, and Dirk Nowotka. Testing Generalised Freeness of Words. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 337-349, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2014.337,
  author =	{Gawrychowski, Pawel and Manea, Florin and Nowotka, Dirk},
  title =	{{Testing Generalised Freeness of Words}},
  booktitle =	{31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)},
  pages =	{337--349},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-65-1},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{25},
  editor =	{Mayr, Ernst W. and Portier, Natacha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.337},
  URN =		{urn:nbn:de:0030-drops-44694},
  doi =		{10.4230/LIPIcs.STACS.2014.337},
  annote =	{Keywords: Stringology, Pattern matching, Repetition, Pseudo-repetition}
}

Document

DOI: 10.4230/LIPIcs.STACS.2013.257

Finding Pseudo-repetitions

Authors: Pawel Gawrychowski, Florin Manea, Robert Mercas, Dirk Nowotka, and Catalin Tiseanu

Published in: LIPIcs, Volume 20, 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013)

Abstract

Pseudo-repetitions are a natural generalization of the classical notion of repetitions in sequences. We solve fundamental algorithmic questions on pseudo-repetitions by application of insightful combinatorial results on words. More precisely, we efficiently decide whether a word is a pseudo-repetition and find all the pseudo-repetitive factors of a word.

Cite as

Pawel Gawrychowski, Florin Manea, Robert Mercas, Dirk Nowotka, and Catalin Tiseanu. Finding Pseudo-repetitions. In 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013). Leibniz International Proceedings in Informatics (LIPIcs), Volume 20, pp. 257-268, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2013.257,
  author =	{Gawrychowski, Pawel and Manea, Florin and Mercas, Robert and Nowotka, Dirk and Tiseanu, Catalin},
  title =	{{Finding Pseudo-repetitions}},
  booktitle =	{30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013)},
  pages =	{257--268},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-50-7},
  ISSN =	{1868-8969},
  year =	{2013},
  volume =	{20},
  editor =	{Portier, Natacha and Wilke, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2013.257},
  URN =		{urn:nbn:de:0030-drops-39394},
  doi =		{10.4230/LIPIcs.STACS.2013.257},
  annote =	{Keywords: Stringology, Pattern matching, Repetition, Pseudo-repetition}
}

Document

DOI: 10.4230/LIPIcs.STACS.2012.624

Tying up the loose ends in fully LZW-compressed pattern matching

Authors: Pawel Gawrychowski

Published in: LIPIcs, Volume 14, 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)

Abstract

We consider a natural generalization of the classical pattern matching problem: given compressed representations of a pattern p[1..M] and a text t[1..N] of sizes m and n, respectively, does p occur in t? We develop an optimal linear time solution for the case when p and t are compressed using the LZW method. This improves the previously known O((n+m)log(n+m)) time solution of Gasieniec and Rytter, and essentially closes the line of research devoted to tudying LZW-compressed exact pattern matching.

Cite as

Pawel Gawrychowski. Tying up the loose ends in fully LZW-compressed pattern matching. In 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012). Leibniz International Proceedings in Informatics (LIPIcs), Volume 14, pp. 624-635, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)

Copy BibTex To Clipboard

@InProceedings{gawrychowski:LIPIcs.STACS.2012.624,
  author =	{Gawrychowski, Pawel},
  title =	{{Tying up the loose ends in fully LZW-compressed pattern matching}},
  booktitle =	{29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)},
  pages =	{624--635},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-35-4},
  ISSN =	{1868-8969},
  year =	{2012},
  volume =	{14},
  editor =	{D\"{u}rr, Christoph and Wilke, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2012.624},
  URN =		{urn:nbn:de:0030-drops-33975},
  doi =		{10.4230/LIPIcs.STACS.2012.624},
  annote =	{Keywords: pattern matching, compression, Lempel-Ziv-Welch}
}

Document

DOI: 10.4230/DagSemProc.09281.4

Minimax Trees in Linear Time with Applications

Authors: Pawel Gawrychowski and Travis Gagie

Published in: Dagstuhl Seminar Proceedings, Volume 9281, Search Methodologies (2009)

Abstract

A minimax tree is similar to a Huffman tree except that, instead of minimizing the weighted average of the leaves' depths, it minimizes the maximum of any leaf's weight plus its depth. Golumbic (1976) introduced minimax trees and gave a Huffman-like, $O (n log n)$-time algorithm for building them. Drmota and Szpankowski (2002) gave another $O (n log n)$-time algorithm, which takes linear time when the weights are already sorted by their fractional parts. In this paper we give the first linear-time algorithm for building minimax trees for unsorted real weights.

Cite as

Pawel Gawrychowski and Travis Gagie. Minimax Trees in Linear Time with Applications. In Search Methodologies. Dagstuhl Seminar Proceedings, Volume 9281, pp. 1-11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:DagSemProc.09281.4,
  author =	{Gawrychowski, Pawel and Gagie, Travis},
  title =	{{Minimax Trees in Linear Time with Applications}},
  booktitle =	{Search Methodologies},
  pages =	{1--11},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2009},
  volume =	{9281},
  editor =	{Rudolf Ahlswede and Ferdinando Cicalese and Ugo Vaccaro},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.09281.4},
  URN =		{urn:nbn:de:0030-drops-22421},
  doi =		{10.4230/DagSemProc.09281.4},
  annote =	{Keywords: Data structures, data compression, prefix-free coding}
}

Search Results

Documents authored by Gawrychowski, Pawel

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message